49 research outputs found

    Contribution to privacy-enhancing tecnologies for machine learning applications

    Get PDF
    For some time now, big data applications have been enabling revolutionary innovation in every aspect of our daily life by taking advantage of lots of data generated from the interactions of users with technology. Supported by machine learning and unprecedented computation capabilities, different entities are capable of efficiently exploiting such data to obtain significant utility. However, since personal information is involved, these practices raise serious privacy concerns. Although multiple privacy protection mechanisms have been proposed, there are some challenges that need to be addressed for these mechanisms to be adopted in practice, i.e., to be “usable” beyond the privacy guarantee offered. To start, the real impact of privacy protection mechanisms on data utility is not clear, thus an empirical evaluation of such impact is crucial. Moreover, since privacy is commonly obtained through the perturbation of large data sets, usable privacy technologies may require not only preservation of data utility but also efficient algorithms in terms of computation speed. Satisfying both requirements is key to encourage the adoption of privacy initiatives. Although considerable effort has been devoted to design less “destructive” privacy mechanisms, the utility metrics employed may not be appropriate, thus the wellness of such mechanisms would be incorrectly measured. On the other hand, despite the advent of big data, more efficient approaches are not being considered. Not complying with the requirements of current applications may hinder the adoption of privacy technologies. In the first part of this thesis, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, evaluated over original test data. Our experiments show that the impact of the de facto microaggregation standard on the performance of machine-learning algorithms is often minor for a variety of data sets. Furthermore, experimental evidence suggests that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data. Secondly, we address the problem of preserving the empirical utility of data. By transforming the original data records to a different data space, our approach, based on linear discriminant analysis, enables k-anonymous microaggregation to be adapted to the application domain of data. To do this, first, data is rotated (projected) towards the direction of maximum discrimination and, second, scaled in this direction, penalizing distortion across the classification threshold. As a result, data utility is preserved in terms of the accuracy of machine learned models for a number of standardized data sets. Afterwards, we propose a mechanism to reduce the running time for the k-anonymous microaggregation algorithm. This is obtained by simplifying the internal operations of the original algorithm. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, this remarkable speedup factor is achieved with no additional loss of data utility.Les aplicacions de big data impulsen actualment una accelerada innovació aprofitant la gran quantitat d’informació generada a partir de les interaccions dels usuaris amb la tecnologia. Així, qualsevol entitat és capaç d'explotar eficientment les dades per obtenir utilitat, emprant aprenentatge automàtic i capacitats de còmput sense precedents. No obstant això, sorgeixen en aquest escenari serioses preocupacions pel que fa a la privacitat dels usuaris ja que hi ha informació personal involucrada. Tot i que s'han proposat diversos mecanismes de protecció, hi ha alguns reptes per a la seva adopció en la pràctica, és a dir perquè es puguin utilitzar. Per començar, l’impacte real d'aquests mecanismes en la utilitat de les dades no esta clar, raó per la qual la seva avaluació empírica és important. A més, considerant que actualment es manegen grans volums de dades, una privacitat usable requereix, no només preservació de la utilitat de les dades, sinó també algoritmes eficients en temes de temps de còmput. És clau satisfer tots dos requeriments per incentivar l’adopció de mesures de privacitat. Malgrat que hi ha diversos esforços per dissenyar mecanismes de privacitat menys "destructius", les mètriques d'utilitat emprades no serien apropiades, de manera que aquests mecanismes de protecció podrien estar sent incorrectament avaluats. D'altra banda, tot i l’adveniment del big data, la investigació existent no s’enfoca molt en millorar la seva eficiència. Lamentablement, si els requisits de les aplicacions actuals no es satisfan, s’obstaculitzarà l'adopció de tecnologies de privacitat. A la primera part d'aquesta tesi abordem el problema de mesurar l'impacte de la microagregació k-Gnónima en la utilitat empírica de microdades. Per això, quantifiquem la utilitat com la precisió de models de classificació obtinguts a partir de les dades microagregades. i avaluats sobre dades de prova originals. Els experiments mostren que l'impacte de l’algoritme de rmicroagregació estàndard en el rendiment d’algoritmes d'aprenentatge automàtic és usualment menor per a una varietat de conjunts de dades avaluats. A més, l’evidència experimental suggereix que la mètrica tradicional de distorsió de les dades seria inapropiada per avaluar la utilitat empírica de dades microagregades. Així també estudiem el problema de preservar la utilitat empírica de les dades a l'ésser anonimitzades. Transformant els registres originaIs de dades en un espai de dades diferent, el nostre enfocament, basat en anàlisi de discriminant lineal, permet que el procés de microagregació k-anònima s'adapti al domini d’aplicació de les dades. Per això, primer, les dades són rotades o projectades en la direcció de màxima discriminació i, segon, escalades en aquesta direcció, penalitzant la distorsió a través del llindar de classificació. Com a resultat, la utilitat de les dades es preserva en termes de la precisió dels models d'aprenentatge automàtic en diversos conjunts de dades. Posteriorment, proposem un mecanisme per reduir el temps d'execució per a la microagregació k-anònima. Això s'aconsegueix simplificant les operacions internes de l'algoritme escollit Mitjançant una extensa experimentació sobre diversos conjunts de dades, vam mostrar que el nou algoritme és bastant més ràpid. Aquesta acceleració s'aconsegueix sense que hi ha pèrdua en la utilitat de les dades. Finalment, en un enfocament més aplicat, es proposa una eina de protecció de privacitat d'individus i organitzacions mitjançant l'anonimització de dades sensibles inclosos en logs de seguretat. Es dissenyen diferents mecanismes d'anonimat per implementar-los en base a la definició d'una política de privacitat, en el context d'un projecte europeu que té per objectiu construir un sistema de seguretat unificat

    Contribution to privacy-enhancing tecnologies for machine learning applications

    Get PDF
    For some time now, big data applications have been enabling revolutionary innovation in every aspect of our daily life by taking advantage of lots of data generated from the interactions of users with technology. Supported by machine learning and unprecedented computation capabilities, different entities are capable of efficiently exploiting such data to obtain significant utility. However, since personal information is involved, these practices raise serious privacy concerns. Although multiple privacy protection mechanisms have been proposed, there are some challenges that need to be addressed for these mechanisms to be adopted in practice, i.e., to be “usable” beyond the privacy guarantee offered. To start, the real impact of privacy protection mechanisms on data utility is not clear, thus an empirical evaluation of such impact is crucial. Moreover, since privacy is commonly obtained through the perturbation of large data sets, usable privacy technologies may require not only preservation of data utility but also efficient algorithms in terms of computation speed. Satisfying both requirements is key to encourage the adoption of privacy initiatives. Although considerable effort has been devoted to design less “destructive” privacy mechanisms, the utility metrics employed may not be appropriate, thus the wellness of such mechanisms would be incorrectly measured. On the other hand, despite the advent of big data, more efficient approaches are not being considered. Not complying with the requirements of current applications may hinder the adoption of privacy technologies. In the first part of this thesis, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, evaluated over original test data. Our experiments show that the impact of the de facto microaggregation standard on the performance of machine-learning algorithms is often minor for a variety of data sets. Furthermore, experimental evidence suggests that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data. Secondly, we address the problem of preserving the empirical utility of data. By transforming the original data records to a different data space, our approach, based on linear discriminant analysis, enables k-anonymous microaggregation to be adapted to the application domain of data. To do this, first, data is rotated (projected) towards the direction of maximum discrimination and, second, scaled in this direction, penalizing distortion across the classification threshold. As a result, data utility is preserved in terms of the accuracy of machine learned models for a number of standardized data sets. Afterwards, we propose a mechanism to reduce the running time for the k-anonymous microaggregation algorithm. This is obtained by simplifying the internal operations of the original algorithm. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, this remarkable speedup factor is achieved with no additional loss of data utility.Les aplicacions de big data impulsen actualment una accelerada innovació aprofitant la gran quantitat d’informació generada a partir de les interaccions dels usuaris amb la tecnologia. Així, qualsevol entitat és capaç d'explotar eficientment les dades per obtenir utilitat, emprant aprenentatge automàtic i capacitats de còmput sense precedents. No obstant això, sorgeixen en aquest escenari serioses preocupacions pel que fa a la privacitat dels usuaris ja que hi ha informació personal involucrada. Tot i que s'han proposat diversos mecanismes de protecció, hi ha alguns reptes per a la seva adopció en la pràctica, és a dir perquè es puguin utilitzar. Per començar, l’impacte real d'aquests mecanismes en la utilitat de les dades no esta clar, raó per la qual la seva avaluació empírica és important. A més, considerant que actualment es manegen grans volums de dades, una privacitat usable requereix, no només preservació de la utilitat de les dades, sinó també algoritmes eficients en temes de temps de còmput. És clau satisfer tots dos requeriments per incentivar l’adopció de mesures de privacitat. Malgrat que hi ha diversos esforços per dissenyar mecanismes de privacitat menys "destructius", les mètriques d'utilitat emprades no serien apropiades, de manera que aquests mecanismes de protecció podrien estar sent incorrectament avaluats. D'altra banda, tot i l’adveniment del big data, la investigació existent no s’enfoca molt en millorar la seva eficiència. Lamentablement, si els requisits de les aplicacions actuals no es satisfan, s’obstaculitzarà l'adopció de tecnologies de privacitat. A la primera part d'aquesta tesi abordem el problema de mesurar l'impacte de la microagregació k-Gnónima en la utilitat empírica de microdades. Per això, quantifiquem la utilitat com la precisió de models de classificació obtinguts a partir de les dades microagregades. i avaluats sobre dades de prova originals. Els experiments mostren que l'impacte de l’algoritme de rmicroagregació estàndard en el rendiment d’algoritmes d'aprenentatge automàtic és usualment menor per a una varietat de conjunts de dades avaluats. A més, l’evidència experimental suggereix que la mètrica tradicional de distorsió de les dades seria inapropiada per avaluar la utilitat empírica de dades microagregades. Així també estudiem el problema de preservar la utilitat empírica de les dades a l'ésser anonimitzades. Transformant els registres originaIs de dades en un espai de dades diferent, el nostre enfocament, basat en anàlisi de discriminant lineal, permet que el procés de microagregació k-anònima s'adapti al domini d’aplicació de les dades. Per això, primer, les dades són rotades o projectades en la direcció de màxima discriminació i, segon, escalades en aquesta direcció, penalitzant la distorsió a través del llindar de classificació. Com a resultat, la utilitat de les dades es preserva en termes de la precisió dels models d'aprenentatge automàtic en diversos conjunts de dades. Posteriorment, proposem un mecanisme per reduir el temps d'execució per a la microagregació k-anònima. Això s'aconsegueix simplificant les operacions internes de l'algoritme escollit Mitjançant una extensa experimentació sobre diversos conjunts de dades, vam mostrar que el nou algoritme és bastant més ràpid. Aquesta acceleració s'aconsegueix sense que hi ha pèrdua en la utilitat de les dades. Finalment, en un enfocament més aplicat, es proposa una eina de protecció de privacitat d'individus i organitzacions mitjançant l'anonimització de dades sensibles inclosos en logs de seguretat. Es dissenyen diferents mecanismes d'anonimat per implementar-los en base a la definició d'una política de privacitat, en el context d'un projecte europeu que té per objectiu construir un sistema de seguretat unificat.Postprint (published version

    Online advertising: analysis of privacy threats and protection approaches

    Get PDF
    Online advertising, the pillar of the “free” content on the Web, has revolutionized the marketing business in recent years by creating a myriad of new opportunities for advertisers to reach potential customers. The current advertising model builds upon an intricate infrastructure composed of a variety of intermediary entities and technologies whose main aim is to deliver personalized ads. For this purpose, a wealth of user data is collected, aggregated, processed and traded behind the scenes at an unprecedented rate. Despite the enormous value of online advertising, however, the intrusiveness and ubiquity of these practices prompt serious privacy concerns. This article surveys the online advertising infrastructure and its supporting technologies, and presents a thorough overview of the underlying privacy risks and the solutions that may mitigate them. We first analyze the threats and potential privacy attackers in this scenario of online advertising. In particular, we examine the main components of the advertising infrastructure in terms of tracking capabilities, data collection, aggregation level and privacy risk, and overview the tracking and data-sharing technologies employed by these components. Then, we conduct a comprehensive survey of the most relevant privacy mechanisms, and classify and compare them on the basis of their privacy guarantees and impact on the Web.Peer ReviewedPostprint (author's final draft

    Anonymizing cybersecurity data in critical infrastructures: the CIPSEC approach

    Get PDF
    Cybersecurity logs are permanently generated by network devices to describe security incidents. With modern computing technology, such logs can be exploited to counter threats in real time or before they gain a foothold. To improve these capabilities, logs are usually shared with external entities. However, since cybersecurity logs might contain sensitive data, serious privacy concerns arise, even more when critical infrastructures (CI), handling strategic data, are involved. We propose a tool to protect privacy by anonymizing sensitive data included in cybersecurity logs. We implement anonymization mechanisms grouped through the definition of a privacy policy. We adapt said approach to the context of the EU project CIPSEC that builds a unified security framework to orchestrate security products, thus offering better protection to a group of CIs. Since this framework collects and processes security-related data from multiple devices of CIs, our work is devoted to protecting privacy by integrating our anonymization approach.Peer ReviewedPostprint (published version

    Measuring online tracking and privacy risks on Ecuadorian websites

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Online tracking has become a great enabler of massive surveillance so it is now a critical vector for threatening the privacy of users. Despite the benefits of online tracking for personalized advertising, the complexity of the involved platforms makes it a threat for democracy. In this work, online tracking is measured in Ecuador, a country with a developing adoption of online advertising technologies, having the highest Internet penetration rate in Latin America, but lacking regulation for privacy. By finding out the third party connections triggered through the most popular Ecuadorian websites, the concentration of online tracking is measured in Ecuador. Its impact is also analyzed by studying some particularities in government websites, the usage of advanced mechanisms of tracking, and the adoption of transparency practices in advertising platforms. Our final aim is exposing potential privacy violations.This work was partly supported by the Spanish Ministry ofEconomy and Competitiveness (MINECO) through the project“MAGOS”, ref. TEC2017-84197-C4-3-R. J. Parra-Arnau wassupported by the Spanish government under grant TIN2016-80250-R and by the Catalan government under grant 2017SGR 00705 and is currently the recipient of a Juan de la Ciervapostdoctoral fellowship, IJCI-2016-28239, from the SpanishMinistry of Economy and Competitiveness.Peer ReviewedPostprint (author's final draft

    On the regulation of personal data distribution in online advertising platforms

    Get PDF
    Online tracking is the key enabling technology of modern online advertising. In the recently established model of real-time bidding (RTB), the web pages tracked by ad platforms are shared with advertising agencies (also called DSPs), which, in an auction-based system, may bid for user ad impressions. Since tracking data are no longer confined to ad platforms, RTB poses serious risks to privacy, especially with regard to user profiling, a practice that can be conducted at a very low cost by any DSP or related agency, as we reveal here. In this work, we illustrate these privacy risks by examining a data set with the real ad-auctions of a DSP, and show that for at least 55% of the users tracked by this agency, it paid nothing for their browsing data. To mitigate this abuse, we propose a system that regulates the distribution of bid requests (containing user tracking data) to potentially interested bidders, depending on their previous behavior. In our approach, an ad platform restricts the sharing of tracking data by limiting the number of DSPs participating in each auction, thereby leaving unchanged the current RTB architecture and protocols. However, doing so may have an evident impact on the ad platform’s revenue. The proposed system is designed accordingly, to ensure the revenue is maximized while the abuse by DSPs is prevented to a large degree. Experimental results seem to suggest that our system is able to correct misbehaving DSPs, and consequently enhance user privacy.Peer ReviewedPostprint (author's final draft

    Mathematically optimized, recursive prepartitioning strategies for k-anonymous microaggregation of large-scale datasets

    Get PDF
    © Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/The technical contents of this work fall within the statistical disclosure control (SDC) field, which concerns the postprocessing of the demographic portion of the statistical results of surveys containing sensitive personal information, in order to effectively safeguard the anonymity of the participating respondents. A widely known technique to solve the problem of protecting the privacy of the respondents involved beyond the mere suppression of their identifiers is the k-anonymous microaggregation. Unfortunately, most microaggregation algorithms that produce competitively low levels of distortions exhibit a superlinear running time, typically scaling with the square of the number of records in the dataset. This work proposes and analyzes an optimized prepartitioning strategy to reduce significantly the running time for the k-anonymous microaggregation algorithm operating on large datasets, with mild loss in data utility with respect to that of MDAV, the underlying method. The optimization strategy is based on prepartitioning a dataset recursively until the desired k-anonymity parameter is achieved. Traditional microaggregation algorithms have quadratic computational complexity in the form T(n2). By using the proposed method and fixing the number of recurrent prepartitions we obtain subquadratic complexity in the form T(n3/2), T(n4/3), ..., depending on the number of prepartitions. Alternatively, fixing the ratio between the size of the microcell and the macrocell on each prepartition, quasilinear complexity in the form T(nlog¿n) is achieved. Our method is readily applicable to large-scale datasets with numerical demographic attributes.Peer ReviewedPostprint (author's final draft

    Cognitive vulnerability in mental disorders

    Get PDF
    ABSTRACT: Introduction: Modes of cognitive vulnerability were evaluated in outpatients of psychological services centers with diagnoses of mental disorders. Objective: To establish components of cognitive vulnerability in different mental disorders. Method: The participants were 490 users of psychological services centers from twelve universities in Colombia. To identify the presence or absence of mental disorders, they completed the MINI International Neuropsychiatric Interview.The Young Schemes Questionnaire, the Core Beliefs Questionnaire for Personality Disorders, the Inventory of Automatic Thoughts, and the Coping Strategies Questionnaire were also applied. To establish distinctive characteristics among actual major depression, generalized anxiety disorder, panic disorder, social anxiety, and non-alcoholic substance abuse, a logistic regression analysis was conducted. Results: The results showed cognitive distinctive vulnerability profiles, according to the disorder. Conclusion: The hypothesis of cognitive specificity for the different mental disorders is confirmed.RESUMEN: Introducción: Se evaluaron los modos de vulnerabilidad cognitiva en usuarios de consulta externa en psicología, diagnosticados con trastornos mentales. Objetivo: Establecer componentes de vulnerabilidad cognitiva en diferentes trastornos mentales. Método: Participaron 490 usuarios de servicios psicológicos de doce universidades de Colombia. Se aplicó la Entrevista Neuropsiquiátrica Internacional para identificar la presencia o no de trastornos mentales; igualmente, se aplicaron el Cuestionario de Esquemas de Young, el Cuestionario de Creencias Centrales de Trastornos de la Personalidad, el Inventario de Pensamientos Automáticos y el Cuestionario de Estrategias de Afrontamiento. Se realizaron análisis de regresión logística para establecer características distintivas en los trastornos de depresión mayor actual, ansiedad generalizada, angustia, ansiedad social y abuso de sustancias no alcohólicas. Resultados: Se reportaron perfiles cognitivos de vulnerabilidad diferenciados de acuerdo con el trastorno. Conclusión: Se confirma la hipótesis de especificidad cognitiva para los diferentes trastornos mentales

    Patient preferences and treatment safety for uncomplicated vulvovaginal candidiasis in primary health care

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Vaginitis is a common complaint in primary care. In uncomplicated candidal vaginitis, there are no differences in effectiveness between oral or vaginal treatment. Some studies describe that the preferred treatment is the oral one, but a Cochrane's review points out inconsistencies associated with the report of the preferred way that limit the use of such data. Risk factors associated with recurrent vulvovaginal candidiasis still remain controversial.</p> <p>Methods/Design</p> <p>This work describes a protocol of a multicentric prospective observational study with one year follow up, to describe the women's reasons and preferences to choose the way of administration (oral vs topical) in the treatment of not complicated candidal vaginitis. The number of women required is 765, they are chosen by consecutive sampling. All of whom are aged 16 and over with vaginal discharge and/or vaginal pruritus, diagnosed with not complicated vulvovaginitis in Primary Care in Madrid.</p> <p>The main outcome variable is the preferences of the patients in treatment choice; secondary outcome variables are time to symptoms relief and adverse reactions and the frequency of recurrent vulvovaginitis and the risk factors. In the statistical analysis, for the main objective will be descriptive for each of the variables, bivariant analysis and multivariate analysis (logistic regression).. The dependent variable being the type of treatment chosen (oral or topical) and the independent, the variables that after bivariant analysis, have been associated to the treatment preference.</p> <p>Discussion</p> <p>Clinical decisions, recommendations, and practice guidelines must not only attend to the best available evidence, but also to the values and preferences of the informed patient.</p

    Effectiveness of an intervention for improving drug prescription in primary care patients with multimorbidity and polypharmacy:Study protocol of a cluster randomized clinical trial (Multi-PAP project)

    Get PDF
    This study was funded by the Fondo de Investigaciones Sanitarias ISCIII (Grant Numbers PI15/00276, PI15/00572, PI15/00996), REDISSEC (Project Numbers RD12/0001/0012, RD16/0001/0005), and the European Regional Development Fund ("A way to build Europe").Background: Multimorbidity is associated with negative effects both on people's health and on healthcare systems. A key problem linked to multimorbidity is polypharmacy, which in turn is associated with increased risk of partly preventable adverse effects, including mortality. The Ariadne principles describe a model of care based on a thorough assessment of diseases, treatments (and potential interactions), clinical status, context and preferences of patients with multimorbidity, with the aim of prioritizing and sharing realistic treatment goals that guide an individualized management. The aim of this study is to evaluate the effectiveness of a complex intervention that implements the Ariadne principles in a population of young-old patients with multimorbidity and polypharmacy. The intervention seeks to improve the appropriateness of prescribing in primary care (PC), as measured by the medication appropriateness index (MAI) score at 6 and 12months, as compared with usual care. Methods/Design: Design:pragmatic cluster randomized clinical trial. Unit of randomization: family physician (FP). Unit of analysis: patient. Scope: PC health centres in three autonomous communities: Aragon, Madrid, and Andalusia (Spain). Population: patients aged 65-74years with multimorbidity (≥3 chronic diseases) and polypharmacy (≥5 drugs prescribed in ≥3months). Sample size: n=400 (200 per study arm). Intervention: complex intervention based on the implementation of the Ariadne principles with two components: (1) FP training and (2) FP-patient interview. Outcomes: MAI score, health services use, quality of life (Euroqol 5D-5L), pharmacotherapy and adherence to treatment (Morisky-Green, Haynes-Sackett), and clinical and socio-demographic variables. Statistical analysis: primary outcome is the difference in MAI score between T0 and T1 and corresponding 95% confidence interval. Adjustment for confounding factors will be performed by multilevel analysis. All analyses will be carried out in accordance with the intention-to-treat principle. Discussion: It is essential to provide evidence concerning interventions on PC patients with polypharmacy and multimorbidity, conducted in the context of routine clinical practice, and involving young-old patients with significant potential for preventing negative health outcomes. Trial registration: Clinicaltrials.gov, NCT02866799Publisher PDFPeer reviewe
    corecore